device: reduce reconnect delay after server restart from 120 s to ~5 s#2
Open
full-bars wants to merge 1 commit into
Open
device: reduce reconnect delay after server restart from 120 s to ~5 s#2full-bars wants to merge 1 commit into
full-bars wants to merge 1 commit into
Conversation
Without a way to notify clients of a server restart, clients wait up to RekeyAfterTime (120 s) before re-handshaking with the new instance. Two changes to close that gap: - upLocked: call SendHandshakeInitiation for every peer when the device comes up. The new server proactively reaches out to all configured peers so they can re-establish in under RekeyTimeout (5 s) rather than waiting for natural session expiry. - DrainPeers / Config.Drain: add an explicit drain signal. Calling DrainPeers() (or setting Drain: true in IpcSet2) expires all current keypairs so the next send from each client triggers an immediate re-handshake instead of silently using a session the restarted server no longer knows about.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When the WireGuard server is restarted or redeployed, clients hold session keypairs that the new instance knows nothing about. WireGuard has no protocol-level "server is going away" message, so clients sit idle until their session expires naturally at
RekeyAfterTime(120 s). During that window packets are silently dropped.Raised in the URnetwork Discord: the
userspacewireguardfork was identified as the right place to fix this because the kernel implementation has no equivalent hook.Changes
1. Server-initiated handshake on startup (
device.go—upLocked)SendHandshakeInitiationis now called for every configured peer when the device comes up, in addition to the existing persistent-keepalive send. The new server proactively reaches out to all peers; clients respond withinRekeyTimeout(5 s). Reconnect time after a restart drops from up to 120 s to under 5 s for peers with a known endpoint.2.
DrainPeersmethod +Config.Drainflag (device.go/uapi.go)DrainPeers()callsExpireCurrentKeypairs()on every peer — already implemented onPeer, just not wired up at the device level. Exhausting the send nonce makes the client's very next outbound packet trigger a fresh handshake instead of silently failing.Config.Drain boolexposes this viaIpcSet2so deployment scripts can signal a drain through the existing IPC path before bringing the old process down:Behaviour summary
PersistentKeepalive+ 15 sNotes
bindtestcompile errors are pre-existing onmasterand unrelated to these changes (./device/...builds cleanly).CloseNotifymessage would be the cleanest long-term fix, but these two hooks address the problem without any protocol changes and are backwards-compatible.